-
Notifications
You must be signed in to change notification settings - Fork 18
Run ocrd network sample #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Are there already other showcases and documents for |
Co-authored-by: Stefan Weil <[email protected]>
Thank you for your help. I' m afraid I don't know of any other example deployment for ocrd network by now. I have focused on the docker-deployment so far, because no native installation of processors is needed. |
@stweil, there is also this pad with fast instructions for native environment that we were not able to put under |
logging hack no longer needed with slim containers
Thank you for you input, I think I included all parts in the last commit: d3fa81f. As already noted by bertsky this now depends on OCR-D/core#1303 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting better and better!
@@ -0,0 +1,192 @@ | |||
dest: docker-compose.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we could also split up the docker-compose.yml into a fixed (template) part and then include the generated configs. (So for example, we could include docker-compose.servers.yml and docker-compose.processors.yml.)
We now have to start thinking about how to generate So far, we required a git checkout of all (enabled) submodules (i.e. |
volumes: | ||
- "${{DATA_DIR_HOST}}:/data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joschrew what about the processor resources, shouldn't that be a volume, too?
Preferably as named volume (so module-distributed and user-downloaded files can mix freely), e.g.:
volumes: | |
- "${{DATA_DIR_HOST}}:/data" | |
volumes: | |
- "${{DATA_DIR_HOST}}:/data" | |
- ocrd-resources:/usr/local/share/ocrd-resources |
Unfortunately, it seems we missed the opportunity of the latest sweep across modules to define a unique internal resource location. In ocrd/tesserocr
, we already use /models
as alias for /usr/local/share/ocrd-resources
, but in many other images we still use the latter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps brevity (/models
instead of /usr/local/share/ocrd-resources
) is not an issue if we are using scriped mounts and calls, anyway. And /models
was a hack of sorts: it was not backed by the spec (which still says /usr/local is the system
location, and $XDG_DATA_DIR
the data
location), but only a convention for our fat container images.
Therefore I will switch ocrd/tesserocr back to /usr/local/share/ocrd-resources as the only place to mount (as in all the other workable slim container images).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore I will switch ocrd/tesserocr back to /usr/local/share/ocrd-resources as the only place to mount (as in all the other workable slim container images).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a resource-volume now. What would you suggest to get the resources into the volume? I tried using the resmgr which fails because of missing writing permission on /.config (the setup cannot not use root permissions). Would it be a good idea to to this via the Makefile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you suggest to get the resources into the volume? I tried using the resmgr which fails because of missing writing permission on /.config (the setup cannot not use root permissions). Would it be a good idea to to this via the Makefile?
Ah, that bit us before in the fat containers, so I thought we found a remedy for it: setting XDG_CONFIG_HOME=/usr/local/share/ocrd-resources
as well, so the volume should cover both the resources themselves and the config file ocrd/resources.yml
.
Your /.config
indicates that XDG_CONFIG_HOME
is not set (as HOME=/
). What image is this happening with?
So no, this should work on the user side already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All images. The setup is started as current user, this is necessary because the created workspace files should be of user's permission. If the volume is of root permission (or a named volume, same permissions) the resource download can only triggered by root. But when the container is started as user and then root user is used to download the resources we have logfile permission-errors again. The value of XDG_CONFIG_HOME has no effect on permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right! We do have a problem with named volumes. They are always created as root. Even with docker volume create
. It seems the only way to get normal ownership is by choosing a container-side parent path which already has the permissions we want (as these get inherited by the named volume). But in cases where the directory itself already exists (from the Docker build), its ownership won't change (i.e. stay root). That is a big problem.
The value of XDG_CONFIG_HOME has no effect on permissions.
It does have the effect of controlling where ocrd/resources.yml
gets written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See moby/moby#3124
Okay, so the only way out AFAICS is setting /usr/local/share/ocrd-resources
(and its subdirectories) to 777 in the Dockerfile already. This means in every Dockerfile.
You can test this:
- revert a9f2ffd
- write a Dockerfile.fixup, contents:
ARG BASE_IMAGE
FROM $BASE_IMAGE
RUN mkdir -p /usr/local/share/ocrd-resources
RUN find /usr/local/share/ocrd-resources -type d -exec chmod 777 {} ";"
RUN find /usr/local/share/ocrd-resources -type f -exec chmod 666 {} ";"
- use it to patch a module of your choice, e.g.:
cat Dockerfile.fixup | docker build --build-arg BASE_IMAGE=ocrd/tesserocr -t ocrd/tesserocr:permissions -
- use the patched image, e.g.:
docker run --rm -it -v ocrd-resources:/usr/local/share/ocrd-resources ocrd/tesserocr:permissions ocrd resmgr download ocrd-tesserocr-recognize fra.traineddata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your suggestion, it nearly works for me (setting /usr/local/share/ocrd-resources itself to 777 is also necessary).
Another possibility would be to have more than one folder for the resources (kind of what we have currently). /usr/local/share/ocrd-resoucres could be used for preinstalled resources and the "on-demand"-resources could be installed to another folder (XDG_DATA_HOME/ocrd-resources for example) and for that folder a host-mounted-volume could be used. I'd prefer one folder for all resources but setting global permissions might rise security concerns again like it did when we talked about setting permissions for the logging-folders. Did we have a reason why we don't want to use XDG_DATA_HOME/ocrd-resources?
Edit: suggestion works as expected, I forgot to reset (delete) the mounted named volume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it nearly works for me (setting /usr/local/share/ocrd-resources itself to 777 is also necessary).
Are you sure you tried the exact command above (which should already cover the top-level directory as well)?
Another possibility would be to have more than one folder for the resources [...]
Did we have a reason why we don't want to use XDG_DATA_HOME/ocrd-resources?
Yes, we did have a reason. Not all processors can handle more than 1 location. Tesseract for example can not be brought to look up in more than 1 directory.
It's also not intuitive. You don't want to have to think about resources in terms of where they came from.
This is still an open issue. Currently simply a copied ocrd-all-tool.json is used. But maybe this is the best way for now |
This PR showcases the usage of ocrd network.
Step by step guide to run the example
sudo apt install python3-click
)mkdir -p /tmp/mydata/ocrd-resources
. This can be configured later. The folders must be owned by the current usermake network-setup
(or with a python version set:make network-setup PYTHON=python3.9
) to create required files and a venv that can be used as a clientmake network-start
to start the docker containers (make network-stop
andmake network-clean
to tear down). run-network/venv/bin/activate
and run the workflow with:ocrd-process -m /data/vd18test/mets.xml -w workflow.txt